- 
                Notifications
    
You must be signed in to change notification settings  - Fork 101
 
          ⚡ Improve Engine Performance and Implementation
          #578
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Draft
      
      
            shaneahmed
  wants to merge
  170
  commits into
  develop
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
dev-define-engines-abc
  
      
      
   
  
    
  
  
  
 
  
      
    base: develop
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    - Use `pyproject.toml` for `bdist_wheel` configuration
- Improve `Engines` performance and implementation
          Codecov Report❌ Patch coverage is  Additional details and impacted files@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.27%   94.72%   -4.56%     
===========================================
  Files           71       73       +2     
  Lines         9162     9235      +73     
  Branches      1195     1208      +13     
===========================================
- Hits          9096     8748     -348     
- Misses          40      452     +412     
- Partials        26       35       +9     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
- Refactor engines_abc.py
Engines Performance and ImplementationEngine Performance and Implementation
      for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
# Conflicts: # tiatoolbox/utils/misc.py
# Conflicts: # tests/models/test_feature_extractor.py # tiatoolbox/models/models_abc.py
# Conflicts: # tiatoolbox/cli/common.py # tiatoolbox/cli/nucleus_instance_segment.py # tiatoolbox/cli/patch_predictor.py # tiatoolbox/models/engine/semantic_segmentor.py
* ⚡ Make WSIPatchDataset Pickleable to Support Windows Multithreading (#947) This PR makes the WSIPatchDataset class picklable by delaying the creation of the reader object until the first call to `__getitem__`. This enables the use of multiple loader workers on Windows without errors and provides significant performance improvements. - Delays reader object instantiation to the first `__getitem__` call instead of during initialization - Extracts reader creation logic into a separate `_get_reader` method - Stores image path and mode as instance variables for lazy initialization Speedup for the WSI prediction cell of the patch_prediction example notebook: 2min 48 sec with 0 loader workers -> 1min 13 sec with 4 workers. Note: this PR doesn't have any effect for Linux as the multi-threading already works fine there because Linux multithreading doesn't require things to be pickleable * 🔀 Merge branch develop into dev-engine-abc * 🐛 Fix reader_info read --------- Co-authored-by: Mark Eastwood <[email protected]>
# Conflicts: # tiatoolbox/models/dataset/classification.py
# Conflicts: # tests/models/test_patch_predictor.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
# Conflicts: # tests/models/test_feature_extractor.py # tests/models/test_multi_task_segmentor.py # tests/models/test_nucleus_instance_segmentor.py # tests/models/test_patch_predictor.py # tests/models/test_semantic_segmentation.py # tiatoolbox/models/architecture/__init__.py
## Summary of Changes ### Major Additions - **Dask Integration:** - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code. - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs. - **Zarr Output Support:** - Added support for saving model predictions and intermediate results directly to Zarr format. - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes. - **SemanticSegmentor Engine:** - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`). - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI. - **Enhanced CLI and Config:** - Added CLI options for memory threshold, unified worker options, and improved mask handling. - Updated YAML configs and sample data for new models and test images. - **Utilities and Validation:** - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`). - Improved annotation store conversion for Dask arrays and Zarr-backed outputs. - **Changes to `kwarg`** - Add `memory-threshold` - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers` - Removed `cache_mode` as cache mode is automatically handled. --- ### Major Removals/Refactors - **Removed Old CLI and Redundant Code:** - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`. - Removed legacy cache mode and patch prediction Zarr store tests. - **Refactored Model and Dataset APIs:** - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs. - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic. - **Test Cleanup:** - Removed or updated tests that relied on old APIs or cache mode. - Refactored test assertions for new output types and Dask array handling. - **API Consistency:** - Standardized function and argument names across engines, CLI, and utility modules. - Updated docstrings and type hints for clarity and consistency. --- ### Notable File Changes - **New:** - `tiatoolbox/cli/semantic_segmentor.py` - `tests/engines/test_semantic_segmentor.py` - **Removed:** - `tiatoolbox/cli/semantic_segment.py` - Old cache mode and patch Zarr store tests - **Heavily Modified:** - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py` - CLI modules and test suites - Dataset and utility modules for Dask/Zarr compatibility --- ### Impact - Enables scalable, parallel, and memory-efficient inference and output saving for large images. - Simplifies downstream analysis by supporting Zarr as a native output format. - Lays the groundwork for further Dask-based optimizations in TIAToolbox. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Enginesperformance and implementationmypyType Checks forcli/common.pyPatchPredictorEngine based onEngineABCreturn_probabilitiesoption to Paramsmerge_predictionsoption inPatchPredictorengine.post_process_cache_modewhich allows running the algorithm onWSIinfer_wsifor WSI inferencesave_wsi_outputas this is not required after post processing.merge_predictionsand fixes docstring in EngineABCRunParamscompile_modelis now moved to EngineABC init_calculate_scale_factorclass_dictdefinition._get_zarr_arrayis now a public functionget_zarr_arrayinmiscpatch_predictions_as_annotationsruns the loop onpatch_coordsinstead ofclass_probs